Transit ridership and highway volumes for the Corridor studies are obtained by running various alternative scenarios in STOPS and SERPM models. Both of these models are developed differently and uses different input data sets. However, some of the data between these models come from MPO and transit agencies. The transit data between STOPS and SERPM is in two different formats. STOPS transit inputs are in GTFS format whereas SERPM inputs are in CUBE PT format. Although these two are not comparable formats, they both should represent transit networks. The other input file is socio-demographic or landuse file. The landuse file exists in both model but again in different format and at different geographic levels. Since the transit ridership estimates for the Corridor studies are expected to reply upon STOPS model, transit networks between STOPS and SERPM are not compared here. However, the landuse data is studied in great detail.
The 2015 socio-economic data is developed through linear interpolation. Currently, the 2015 SE data exists in two different models:
The STOPS SE data is at TAZ level where as SERPM model data is at MAZ level. In order to perform a comparative analysis between these two input data, SERPM MAZ level data is aggregated to TAZ. This document shows all findings at TAZ level. Technically SE data in the two models should be same and should originate from the same source. Since both models (STOPS and SERPM) were constantly being updated the model data for 2040 could be different. This document summarizes those differences.
The 2040 SE data exists in the following model locations:
#path <- getwd()
path <-"/Volumes/C/projects/SERPM_Compare/check_seData"
# data directories
dir <- "Corradino_SEData"
fdot.dir <- "FDOT_June_30_2016"
stops.dir <- "STOPS_SEData"
# file names
# TODO (ans): Replace maz_data.csv files with model_data.csv (which is more comprehensive data)
maz.data.files <- c("2010_maz_data.csv", "2015_maz_data.csv", "2040_maz_data.csv")
fdot.maz.files <- c("maz_data_IN_2040R.csv", "maz_data_IN_2040T.csv")
stops_mpo_shapeFile <- "simplified_MPOTAZPopEmp.shp"
taz_county_file <- "taz_county.csv"
# list of TAZs to check
check_taz <- c(76, 387, 979, 1596, 1598, 2253)
# Save R Objects for later use
save.RData.outputs <- TRUEThe SERPM maz data files are developed by FDOT with feedback from various agencies, including the three MPOs in the region. The future year maz data file is constantly updated to reflect revised population and employment projections. Due to this continuous update there are several versions of 2040 data with significant difference across population, households and employment variables. As a part of Corridor studies effort, it is required to document the source of model data being used as well as validate data.
The model data files delivered by Corradino were reviewed for data consistency across the three horizon years: 2010, 2015 and 2040. Some of the data fields are not consistent across all years. Two fields geoSRate and geoSRateNm exist in some maz_data.csv files but not in all.
The 2015 SE data is developed through linear interpolation and thus the growth rate should always be linear and between 2010 and 2040. This section of the code checks if there are any households in 2015 that drop from 2010 but gain back in 2040 (checks growth rate for linearity). The following table shows households across 2010, 2015 and 2040 (data with 5 hhs difference is ignored).
** The household variable in this maz_data.csv file computed by aggregating PopSyn-3 outputs and thus there is some degree of over/under estimation of households at MAZ level when compared to PopSyn-3 inputs ** The differences shown in the data below are within reasonable range.
check_hh <- data_all_years %>%
mutate_each(funs(replace(.,is.na(.),0))) %>%
mutate(diff_1015 = hh_2015 - hh_2010,
diff_1540 = hh_2040 - hh_2015,
check = ifelse((diff_1015 > 0 && diff_1540 < 0) || (diff_1015 < 0 && diff_1540 > 0), 1, 0)) %>%
filter(check == 1 , abs(diff_1540) > 5, abs(diff_1015) > 5)
check_hh <- check_hh %>%
select(TAZ, hh_2010, hh_2015, hh_2040, diff_1015, diff_1540)
# kable(check_hh, caption = "Zones with Inconsistent Households Trends", digits = 0, format.args = list(big.mark = ","))
datatable(check_hh, caption = "Zones with Inconsistent Households Trends")# Save R Object file
if (save.RData.outputs) {
save(check_hh, file = "table_check_hh.RData")
}The following table shows data for selected TAZ: 76, 387, 979, 1596, 1598, 2253. These zones were selected based on the past review of 2015 zonal data. The current model data shows consistent growth rate across household, population and employment variables between 2010, 2015 and 2040 years.
sel_data <- data_all_years %>%
filter(TAZ %in% check_taz) %>%
select(TAZ, pop_2010, pop_2015, pop_2040,
emp_total_2010, emp_total_2015, emp_total_2040,
hh_2010, hh_2015, hh_2040)
kable(sel_data, caption = "Selected TAZ from SERPM Data", digits = 0, format.args = list(big.mark = ","))| TAZ | pop_2010 | pop_2015 | pop_2040 | emp_total_2010 | emp_total_2015 | emp_total_2040 | hh_2010 | hh_2015 | hh_2040 |
|---|---|---|---|---|---|---|---|---|---|
| 76 | 953 | 994 | 1,207 | 119 | 123 | 142 | 569 | 597 | 745 |
| 387 | 214 | 217 | 217 | 745 | 730 | 655 | 98 | 98 | 99 |
| 979 | 1,689 | 1,684 | 1,690 | 243 | 244 | 251 | 788 | 787 | 780 |
| 1,596 | 499 | 505 | 499 | 402 | 417 | 493 | 241 | 241 | 244 |
| 1,598 | 307 | 382 | 656 | 27 | 28 | 31 | 165 | 173 | 241 |
| 2,253 | 34 | 80 | 277 | 265 | 267 | 275 | 15 | 33 | 122 |
The figures below show growth rates by county for selected input variables: households (hh), population (pop), total employment (emp_total), college enrollement (college), school enrollement (school).
Miami-Dade County: The growth rate looks ok here. The hh, pop and emp growth rates are at 25 percent, college enrollment grwoth is at 20 percent and school is at 5 percent.
Broward County: The hh growth rate looks ok, pop seems a bit low at 12 percent but employment is projected to grow by 5 percent? Need to double check with FDOT. Same issue with College and School too.
Palm-Beach County: The growth rate looks ok here. The hh, pop and emp growth rates are at 30 percent, college and school at 30 and 15 percent respectively.
The two official 2040 maz_data.csv files are downloaded from FDOT website June 2016 :
These two data sets are compared with the Corradino delivered 2040 data to make sure the model is using the official version and to document the source of model data being used for the corridor studies.
Several tabulations were computed to ensure that the model data being used for the corridor studies is from the official FDOT Cost Feasible scenario. Table below shows a comparison between the 2040 Cost Feasible and Corradino 2040 model data, where there are zero differences. Chart below shows a scatter plot of population variable from the two data sets where there it clearly depicts both data sets being same.
mgra TAZ.Corradino hh.Corradino pop.Corradino emp_total.Corradino TAZ.FDOT hh.FDOT pop.FDOT emp_total.FDOT diff.hh diff.pop diff.emp pop.bin —– ————– ————- ————– ——————– ——— ——– ——— ————— ——– ——— ——— ——–
Well, the 2040 LRTP data is significantly different from 2040 Cost Feasible and thus the Corradino data also differed. There are too many zones to display the difference in tabular form. The plot below shows a scatter plot of population data from the two data sets.
| mgra | TAZ.Corradino | hh.Corradino | pop.Corradino | emp_total.Corradino | TAZ.FDOT | hh.FDOT | pop.FDOT | emp_total.FDOT | diff.hh | diff.pop | diff.emp | pop.bin |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2901 | 43 | 169 | 0 | 2901 | 43 | 172 | 0 | 0 | -3 | 0 | -1000 to -1 |
| 2 | 2902 | 9 | 23 | 1337 | 2902 | 9 | 23 | 1337 | 0 | 0 | 0 | 0 |
| 3 | 2903 | 497 | 1694 | 379 | 2903 | 497 | 1685 | 379 | 0 | 9 | 0 | 1 to 1000 |
| 4 | 2903 | 273 | 984 | 21 | 2903 | 273 | 1001 | 21 | 0 | -17 | 0 | -1000 to -1 |
| 5 | 2903 | 383 | 1306 | 86 | 2903 | 383 | 1308 | 86 | 0 | -2 | 0 | -1000 to -1 |
| 6 | 2903 | 212 | 861 | 14 | 2903 | 212 | 841 | 14 | 0 | 20 | 0 | 1 to 1000 |
The latest South East Florida Regional STOPS model is downloaded from FDOT page and was reviewed. As a part of the review, model landuse data and observed APC counts were checked. Since the Corridor studies use both SERPM and STOPS models, it is important to check and ensure the input data is consistent between the models. The downloaded SEFL STOPS model consists of 2010, 2015, and 2040 population and employment data at TAZ level. As per SEFL STOPS model documentation, the 2014 data computed by interpolating between 2010 and 2040. STOPS model utilizes only population and employment variables and household variable is not used and thus not provided in the data set.
This data is clearly different from SERPM 2015 MAZ data.
# Read data from stops input
shape <- readOGR(paste0(path,"/",stops.dir,"/",stops_mpo_shapeFile), layer = "simplified_MPOTAZPopEmp", verbose = FALSE)
stops_se <- shape@data
stops_sel_data <- stops_se %>%
filter(TAZ_REG %in% check_taz) %>%
select(TAZ_REG, POP_10, POP_15, POP_40,
TOTE_10, TOTE_15, TOTE_40)
kable(stops_sel_data, caption = "Selected TAZ from STOPS Data", digits = 0, format.args = list(big.mark = ","))| TAZ_REG | POP_10 | POP_15 | POP_40 | TOTE_10 | TOTE_15 | TOTE_40 |
|---|---|---|---|---|---|---|
| 76 | 953 | 994 | 1,202 | 119 | 123 | 142 |
| 387 | 214 | 214 | 216 | 745 | 730 | 655 |
| 979 | 1,689 | 1,689 | 1,691 | 243 | 244 | 251 |
| 1,596 | 499 | 498 | 495 | 402 | 417 | 493 |
| 1,598 | 307 | 364 | 649 | 27 | 28 | 31 |
| 2,253 | 34 | 75 | 279 | 265 | 267 | 275 |
Table below shows the number of TAZ by range of pop and emp difference. If there is no difference then it is not tabulated here.
| bin | diff.pop_15 | diff_emp_15 |
|---|---|---|
| -100 to -50 | 14 | NA |
| -20 to 0 | 1445 | 177 |
| -50 to -20 | 50 | NA |
| -500 to -100 | 21 | NA |
| -5000 to -500 | 5 | NA |
| 0 to 20 | 1496 | 885 |
| 100 to 500 | 27 | 2 |
| 20 to 50 | 88 | 1 |
| 50 to 100 | 44 | NA |
| 500 to 5000 | 2 | 1 |
The plot below shows population difference between the two data sets.
The plot below shows employment difference between the two data sets.
The map below shows population difference for 2010, 2015 and 2040 by TAZ between the two data sets. The population for 2010 between the two models (SERPM and STOPS) is the same data where as the 2040 data is different across most of the TAZs. I guess the 2040 difference trickled down into 2015 when interpolated.
The employment data for 2010 between the two models (SERPM and STOPS) is the same data where as the 2015 data is different across most of the TAZs. About 5 TAZs show varying employment data for year 2040. Table below shows those 5 TAZs.
tabulate_emp_diff_2040 <- df %>%
filter(diff_emp_40 != 0) %>%
select(TAZ, emp_total_2040, TOTE_40, diff_emp_40)
kable(tabulate_emp_diff_2040)| TAZ | emp_total_2040 | TOTE_40 | diff_emp_40 |
|---|---|---|---|
| 825 | 8160 | 250 | 7910 |
| 854 | 89 | 0 | 89 |
| 864 | 3070 | 386 | 2684 |
| 1058 | 3000 | 0 | 3000 |
| 2409 | 1831 | 1678 | 153 |
The map below shows population difference for 2010, 2015 and 2040 by TAZ between the two data sets.